one-shot neural architecture search
Cream of the Crop: Distilling Prioritized Paths For One-Shot Neural Architecture Search
One-shot weight sharing methods have recently drawn great attention in neural architecture search due to high efficiency and competitive performance. However, weight sharing across models has an inherent deficiency, i.e., insufficient training of subnetworks in the hypernetwork. To alleviate this problem, we present a simple yet effective architecture distillation method. The central idea is that subnetworks can learn collaboratively and teach each other throughout the training process, aiming to boost the convergence of individual models. We introduce the concept of prioritized path, which refers to the architecture candidates exhibiting superior performance during training.
Bridging the Gap between Sample-based and One-shot Neural Architecture Search with BONAS
Neural Architecture Search (NAS) has shown great potentials in finding better neural network designs. Sample-based NAS is the most reliable approach which aims at exploring the search space and evaluating the most promising architectures. However, it is computationally very costly. As a remedy, the one-shot approach has emerged as a popular technique for accelerating NAS using weight-sharing. However, due to the weight-sharing of vastly different networks, the one-shot approach is less reliable than the sample-based approach. In this work, we propose BONAS (Bayesian Optimized Neural Architecture Search), a sample-based NAS framework which is accelerated using weight-sharing to evaluate multiple related architectures simultaneously.
Bridging the Gap between Sample-based and One-shot Neural Architecture Search with BONAS
Neural Architecture Search (NAS) has shown great potentials in finding better neural network designs. Sample-based NAS is the most reliable approach which aims at exploring the search space and evaluating the most promising architectures. However, it is computationally very costly. As a remedy, the one-shot approach has emerged as a popular technique for accelerating NAS using weight-sharing. However, due to the weight-sharing of vastly different networks, the one-shot approach is less reliable than the sample-based approach. In this work, we propose BONAS (Bayesian Optimized Neural Architecture Search), a sample-based NAS framework which is accelerated using weight-sharing to evaluate multiple related architectures simultaneously.
Review for NeurIPS paper: Cream of the Crop: Distilling Prioritized Paths For One-Shot Neural Architecture Search
Weaknesses: The search space is not the same as the google publications but similar to once-for-all. The se-ratio is 0.25 in this paper's code, the expansion rates are {4,6} in this paper and the maximum depth is 5 in every stage, slightly different. Thus, please report #params in Tab. 1. L120. In this paper, the author uses 2K images as the validation set (L212) and use the validation loss to train the meta-network M. I'm curious that the author claim that this step is time-consuming (L159), then how many iterations in total are used for updating M in this paper? The Kendall rank is important, and I prefer more results.
- Information Technology > Artificial Intelligence > Cognitive Science (0.76)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.53)
- Information Technology > Artificial Intelligence > Systems & Languages > Problem-Independent Architectures (0.43)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.39)
Review for NeurIPS paper: Bridging the Gap between Sample-based and One-shot Neural Architecture Search with BONAS
Additional Feedback: Overall I think this paper is strong enough to recommend acceptance, the ideas are interesting and well motivated and the evaluation across benchmarks is reasonably thorough. Misc questions: - for the GCN, were alternatives to a global node considered? For example, it is common to see pooling across all nodes used to get a final embedding - how was 100 decided upon for the number of candidates to test at once? It would be interesting to see how changing this number changes the sampling efficiency/quality/runtime of the search - were weights preserved across sampling rounds as in ENAS or reinitialized each time? the trade-off/reliabilty in weight sharing in this case seems like it would be a bit different than the impact of weight sharing when considering a simultaneous pool of candidates - is it possible to clarify the EA used to produced candidates, there wasn't too much discussion on why it was used and the degree to which it helped over randomly sampling candidates - the correlations reported in Table 1 are good, but seems like it would be useful to quantify the quality of the model's scoring estimates as the search progresses, that is, at initialization it is guiding the search having only seen a smaller pool of architectures, how good is the correlation at the beginnning and how does it improve over the course of the search? If the search were run again from scratch, how consistent would it be?
Review for NeurIPS paper: Bridging the Gap between Sample-based and One-shot Neural Architecture Search with BONAS
Overall, the reviewers found the key ideas in the paper novel and well-motivated. I support the reviewers' request for ablation studies to better disentangle the relative contribution of different components and the impact of different hyperparameters. Finally, please include standard deviations in table 2. They are readily available for the methods you are comparing against and the differences between methods are sufficiently small that it would be good to have an idea of variation across seeds.
Cream of the Crop: Distilling Prioritized Paths For One-Shot Neural Architecture Search
One-shot weight sharing methods have recently drawn great attention in neural architecture search due to high efficiency and competitive performance. However, weight sharing across models has an inherent deficiency, i.e., insufficient training of subnetworks in the hypernetwork. To alleviate this problem, we present a simple yet effective architecture distillation method. The central idea is that subnetworks can learn collaboratively and teach each other throughout the training process, aiming to boost the convergence of individual models. We introduce the concept of prioritized path, which refers to the architecture candidates exhibiting superior performance during training.
Bridging the Gap between Sample-based and One-shot Neural Architecture Search with BONAS
Neural Architecture Search (NAS) has shown great potentials in finding better neural network designs. Sample-based NAS is the most reliable approach which aims at exploring the search space and evaluating the most promising architectures. However, it is computationally very costly. As a remedy, the one-shot approach has emerged as a popular technique for accelerating NAS using weight-sharing. However, due to the weight-sharing of vastly different networks, the one-shot approach is less reliable than the sample-based approach. In this work, we propose BONAS (Bayesian Optimized Neural Architecture Search), a sample-based NAS framework which is accelerated using weight-sharing to evaluate multiple related architectures simultaneously.